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Abstract. Particle accelerators are invaluable tools for research in the basic and applied sciences, 
in fields such as materials science, chemistry, the biosciences, particle physics, nuclear physics and 
medicine. The design, commissioning, and operation of accelerator facilities is a non-trivial task, due 
to the large number of control parameters and the complex interplay of several conflicting design 
goals. 

We propose to tackle this problem by means of multi-objective optimization algorithms which 
also facilitate a parallel deployment. In order to compute solutions in a meaningful time frame we 
require a fast and scalable software framework. 

In this paper, we present the implementation of such a general-purpose framework for simulation- 
based multi-objective optimization methods that allows the automatic investigation of optimal sets 
of machine parameters. The implementation is based on a master/slave paradigm, employing several 
masters that govern a set of slaves executing simulations and performing optimization tasks. 

Using evolutionary algorithms and OPAL simulations as optimizer and forward solver in our 
framework, we present validation experiments and first results of multi-objective optimization prob- 
lems in the domain of beam dynamics. 

Key words, multi-objective optimization, beam dynamics, high performance computing 

1. INTRODUCTION. Particle accelerators play a significant role in many 
aspects of science and technology. Fields, such as material science, chemistry, the 
biosciences, particle physics, nuclear physics and medicine depend on reliable and 
effective particle accelerators, both as research but as well as practical tools. Achieving 
the required performance in the design, commissioning, and operation of accelerator 
facilities is a complex and versatile problem. Today, tuning machine parameters, e.g., 
bunch charge, emission time and various parameters of beamline elements, is most 
commonly done manually by running simulation codes to scan the parameter space. 
This approach is tedious, time consuming and can be error prone. In order to be 
able to reliably identify optimal configurations of accelerators we propose to solve 
large multi-objective design optimization problems to automate the investigation for 
an optimal set of tuning parameters. Observe that multiple and conflicting optimality 
criteria call for a multi-objective approach. 

We developed a modular multi-objective software framework (see Fig. where 
the core functionality is decoupled from the "forward solver" and the optimizer. To 
that end, we use a master/slave mechanism where a master process governs a set 
of slave processes given some computational tasks to complete. This separation al- 
lows to easily interchange optimizer algorithms, forward solvers and optimization 
problems. A "pilot" coordinates all efforts between the optimization algorithm and 
the forward solver. This forms a robust and general framework for massively par- 
allel multi-objective optimization. Currently the framework offers one concrete op- 
timization algorithm, an evolutionary algorithm employing a NSGAII selector [l]. 
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Fig. 1.1. Multi-objective framework: the pilot (master) solves the optimization problem specified 
in the input file by coordinating optimizer algorithms and workers running forward solves. 



Normally, simulation based approaches are plagued by the trade-of between level of 
detail and time to solution. We address this problem by using forward solvers with 
different time and resolution complexity. 

The framework discussed here, incorporates the following three contributions: 



1. 



Implementation of a scalable optimization algorithm capable of approximat- 
ing Pareto fronts in high dimensional spaces, 

design and implementation of a modular framework that is simple to use and 
deploy on large scale computation resources, and 

demonstrating the usefulness of the proposed framework on a real world ap- 
plication in the domain of particle accelerators. 



Here, we will focus on a real world application arising from PSI's SwiSSFEL 24 
activities, which have an immediate and a long-range impact, by optimizing important 
parameters of PSPs 250 MeV injector test facihty (and later for the fuU PSI 
SwiSSFEL). 

The next section introduces the notation of multi-objective optimization theory 
and describes the first implemented optimizer. In Section [3] we discuss the implemen- 
tation of the framework. We introduce the employed forward- solver in Section [4j A 
validation and a proof of concept application of a beam dynamics problem is discussed 
in Section [5] 

2. MULTI-OBJECTIVE OPTIMIZATION. Optimization problems deal 
with finding one or more feasible solutions corresponding to extreme values of objec- 
tives. If more than one objective is present in the optimization problem we call this 
a multi-objective optimization problem (MOOP). A MOOP is defined as 
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where f denotes the objectives (2.1), g the constraints ( 2.2 ) and x the design variables 




Fig. 2.1. Two competing objectives can not be optimal at the same time. Red dots represent 
Pareto optimal points, while the green square X4 is dominated ( exhibits a worse price per performance 
ratio than e.g. XjJ by all points on the blue curve (Pareto front). 



Often, we encounter conflicting objectives complicating the concept of optimality. 
To illustrate this, let us consider the problem of buying a car. Naturally, we try to 
bargain the lowest price for the best performance, e.g. maximal horsepower or minimal 



fuel consumption. This can be formulated as MOOP (2.4). 

min [price, —performance]'^ 

s.t. lowpr < price < highp^ (2.4) 
lowpo < performance < highp^ 

In general, it is not possible to get the maximal performance for the lowest price 
and a trade-off decision between performance and price has to be reached (see Fig- 
ure 2.1). Since not every choice is equally profitable for the buyer (for example, car 
2:4 costs as much as X2 but offers less performance), we pick trade-offs (red points) 
that are essentially "equally optimal" in both conflicting objectives, meaning, we can- 
not improve one point without hurting at least one other solution. This is known as 
Pareto optimality. The set of Pareto optimal points (blue curve) forms the Pareto 
front or surface. All points on this surface are considered to be Pareto optimal. 

Once the shape of the Pareto front has been determined the buyer can specify 
preference, balancing the features by observing the effect on the optimality criteria, 
converging to the preferred solution. This is called a posteriori preference specification 
since we select a solution after all possible trade-offs have been presented to us. An 
alternative is to specify preference a priori, e.g., by weighting (specifying preference 
before solving the problem) and combining all objectives into one and applying a 
single-objective method to solve the problem (yielding only one solution). In many 
situations preference is not known a priori and an a posteriori preference specification 
helps conveying a deeper understanding of the solution space. The Pareto front can 
be explored and the impact of a trade-off decision then becomes visible. 

Sampling Pareto fronts is far from trivial. A number of approaches have been 



proposed, e.g. evolutionary algorithms [10], simulated annealing 21 , swarm meth- 
ods 20 , and many more [9j[T2][l9j[26] . In the next section we briefly introduce the 
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Fig. 2.2. Schematic view of interplay between selector and variator. The selector ranks all indi- 
viduals in the population according to fitness and subsequently the variator uses the fittest individuals 
to produces new offspring. Finally, the new children are reintroduced in the population. 



theory of evolutionary algorithms used in the present work. 

2.1. Evolutionary Algorithms. Evolutionary algorithms are loosely based on 
nature's evolutionary principles to guide a population of individuals towards an im- 
proved solution by honoring the "survival of the fittest" practice. This "simulated" 
evolutionary process preserves entropy (or diversity in biological terms) by applying 
genetic operators, such as mutation and crossover, to remix the fittest individuals in a 
population. Maintaining diversity is a crucial feature for the success of all evolutionary 
algorithms. 

In general, a generic evolutionary algorithm consists of the following components: 

• Genes: traits defining an individual, 

• Fitness: a mapping from genes to a fitness value for each individual, 

• Selector: selecting the k fittest individuals of a population based on some 
sort of ordering, 

• Variator: recombination (mutations and crossover) operators for offspring 
generation. 

Applied to multi-objective optimization problems, genes correspond to design 
variables. The fitness of an individual is loosely related to the value of the objective 
functions for the corresponding genes. Figure [2?2] schematically depicts the connection 
of the components introduced above. The process starts with an initially random 
population of individuals, each individual with a unique set of genes and corresponding 
fitness, representing one location in the search space. In a next step the population is 
processed by the selector determining the k fittest individuals according to their fitness 
values. While the k fittest individuals are passed to the variator, the remaining n — k 
individuals are eliminated from the population. The Variator mates and recombines 
the k fittest individuals to generate new ofl^spring. After evaluating the fitness of all 
the freshly born individuals a generation cycle has completed and the process can 
start anew. 

Since there already exist plenty of implementations of evolutionary algorithms, we 
decided to incorporate the PISA library [l] into our framework. One of the advantages 
of PISA is that it separates variator from selector, rendering the library expandable 
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and configurable. Implementing a variator was enough to use PISA in our framework 
and retain access to all available PISA selectors. As shown in Figure [2?2j the selector 
is in charge of ordering a set of d-dimensional vectors and selecting the k fittest 
individuals currently in the population. The performance of a selector depends on 
the number of objectives and the surface of the search space. So far, we used an 
NSGA-II selector fTl\ exhibiting satisfactory convergence performance. 

The task of the variator is to generate offspring and ensure diversity in the popu- 
lation. The variator can start generating offspring once the fitness of every individual 
of the population has been evaluated. This explicit synchronization point defines an 
obvious bottleneck for parallel implementations of evolutionary algorithms. In the 
worst case some MPI threads are taking a long time to compute the fitness of the last 
individual in the pool of individuals to evaluate. During this time all other resources 
are idle and wait for the result of this one individual in order to continue to generate 
and evaluate offspring. To counteract this effect we call the selector already when two 
individuals have finished evaluating their fitness, lifting the boundaries between gener- 
ations and evaluating the performance of individuals. New offspring will be generated 
and MPI threads can immediately go back to work on the next fitness evaluation. By 
calling the selector more frequently (already after two offspring individuals have been 
evaluated) results in better populations since bad solutions are rejected faster. On 
the other hand, calling the selector more often is computationally more expensive. 

Our variator implementation uses the master/slave architecture, presented in the 
next section, to run as many function evaluations as possible in parallel. Additionally, 
various crossover and mutation policies are available for tuning the algorithm to the 
optimization problem. 

3. THE FRAMEWORK. Simulation based multi-objective optimization prob- 
lems are omnipresent in research and the industry. The simulation and optimization 
problems arising in such problems are in general very big and computationally de- 
manding. This motivated us to design a massively parallel general purpose framework. 
The key traits of such a design can be summarized as: 

• support any multi-objective optimization method, 

• support any function evaluator: simulation code or measurements, 

• offer a general description/specification of objectives, constraints and design 
variables, 

• run efficiently in parallel on current large-scale high-end clusters and super- 
computers. 



3.1. Related Work. Several similar frameworks, e.g. [13 16 22 23 , have been 
proposed. Commonly these frameworks are tightly coupled to an optimization algo- 
rithm, e.g. only providing evolutionary algorithms as optimizers. Users can merely 
specify optimization problems, but cannot change the optimization algorithm. Our 
framework follows a more general approach, providing a user-friendly way to intro- 
duce new or choose from existing built-in multi-objective optimization algorithms. 
Tailoring the optimization algorithm to the optimization problem at hand is an im- 
portant feature due to the many different characteristics of optimization problems 
that should be handled by such a general framework. As an example, we show 
how Pisa jl , an existing evolutionary algorithm library, was integrated with ease. 
Similarly, other multi-objective algorithms could be incorporated and used to solve 
optimization problems. 



The framework presented in 22 resembles our implementation the most, aside 
from their tight coupling with an evolutionary algorithm optimization strategy. The 
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Fig. 3.1. Schematic view of messages passed within the network between the three roles. The 
dashed cyan path describes a request (job j\ ) sent from Oi to the Pilot being handled by Wj . 
Subsequently the result r^ is returned to the requesting Optimizer (Oi). 



authors propose a plug-in based framework employing an island parallelization model, 
where multiple populations are evaluated concurrently and independently up to a 
point where some number of individuals of the population are exchanged. This is 
especially useful to prevent the search algorithm to get stuck in a local minimum. A 
set of default plug-ins for genetic operators, selectors and other components of the 
algorithms are provided by their framework. User-based plug-ins can be incorporated 
into the framework by implementing a simple set of functions. 

Additionally, as with simulation based multi-objective optimization, we can ex- 
ploit the fact that both the optimizer and simulation part of the process use a certain 
amount of resources. The ratio of work between optimizer and simulation costs can 
be reflected in the ratio of number of processors assigned to each task. This not only 
provides users with great flexibility in using any simulation or optimizer, but renders 
influencing the role assignment easy as well. 

3.2. Components. The basic assumption in simulation-based optimization is 
that we need to call an expensive simulation software component present in the con- 
straints or objectives. We divide the framework in three exchangeable components, as 
shown in Figure [3T] to encapsulate the major behavioral patterns of the framework. 

The Pilot component acts as a bridge between the optimizer and forward solvers, 
providing the necessary functionality to handle passing requests and results between 
the Optimizer and the Simulation modules. 

We implemented the framework in C++, utilizing features like template parameters 
to specify the composition of the framework (shown in Listing 1). 

Code Listing 1: Assembling the optimizer from components 

typedef OpallnputFileParser Input_t; 
typedef PisaVariator Opt_t; 
typedef OpalS imulat ion Sini_t; 

typedef Pilot< Input_t , Opt_t , Sim_t > Pllot_t; 



The framework provides "default" implementations that can be controlled via 
command line options. Due to its modular design, all components can be completely 
customized. 

Every available MPI thread will take up one of the three available roles (see 
Figure 1.1): one thread acts as Pilot, the remaining threads are divided amongst 
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Worker and Optimizer roles. Both, the Worker and the Optimizer can consist 
of multiple MPI threads to exploit parallelism. As shown in Figure |3.1| the Pilot 
is used to coordinate all "information requests" between the Optimizer and the 
Worker. An information request is a job that consists of a set of design variables 
(e.g. the genes of an individual) and a type of information it requests (e.g. function 
evaluation or derivative). The Pilot keeps checking for idle Worker and assigns jobs 
in the queue to any free Worker. Once the Worker has computed and evaluated 
the request its results are returned to the Optimizer that originally requested the 
information. 

After a thread gets appointed a role it starts a polling loop asynchronously check- 
ing for appropriate incoming requests. To that end a Poller interface helper class 
has been introduced. The Poller interface consists of an infinite loop that checks 
periodically for new MPI messages. Upon reception a new message is immediately 
forwarded to the appropriate handler, the onMessageO method. The method is 
called with the MPI_Status of the received message and a size_t value specifying 
different values depending on the value of the MPI_Tag. The Poller interface allows 
the implementation of special methods (denoted hooks) determining the behavior of 
the polling process, e.g. for actions that need to be taken after a message has been 
handled. Every POLLER terminates the loop upon receiving a special MPI tag. 

3.3. Implementing an Optimizer. All Optimizer implementations have to 
respect the API shown in Listing 2. 

Code Listing 2: Optimizer API 

virtual void initializeC) = 0; 

// Poller hooks 
virtual void setupPoll C) = 0; 
virtual void prePoll C) = 0; 
virtual void postPollO = 0; 
virtual void onStopO = 0; 

virtual bool onMessage CMPI_Status status, 
size_t length) = 0; 



All processors running an Optimizer component call the initialize entry point 
after role assignment in the Pilot. The implementation of initialize must set up 
and start the poller and the optimization code. Since an optimizer derives from the 
Poller interface, predefined hooks can be used to determine the polling procedure. 
Hooks can be implemented as empty methods, but the onMessage implementation 
should reflect the optimization part of the protocol for handling events from the 
Pilot. A special set of communicator groups serves as communication channels to 
the Pilot and its job queue and if existing to threads supporting the Optimizer 
component. 

3.4. Implementing a Forward Solver. In most cases Simulation implemen- 
tations are simple wrappers to run an existing "external" simulation code using a set 
of design variables as input. As for the Optimizer component there exists a base 
class, labeled Simulation as common basis for all Simulation implementations. In 
addition, this component also inherits from the Worker class, already implementing 
the polling protocol for default worker types. As shown in the API in Listing 3 the 
Worker class expects an implementation to provide implementations for those three 
methods. 

Code Listing 3: Simulation API 



7 



virtual void runO = 0; 

virtual void collectResults () = 0; 

virtual reqVarContainer_t getResultsC) 



= 0; 



First, upon receiving a new job, the Worker will call the run method on the Sim- 
ulation implementation. This expects the Simulation implementation to run the 
simulation in a blocking fashion, meaning the method call blocks and does not return 
until the simulation has terminated. Subsequently, the Worker calls collectResults, 
where the Simulation prepares the result data, e.g., parsing output files, and stores 
the requested information in a reqVarContainer_t data structure. Finally, the re- 
sults obtained with getResults are sent to the Pilot. As before, the serialized 
data is exchanged using MPI point-to-point communication using a specific set of 
communicators . 

3.5. Specifying the Optimization Problem. We aimed at an easy and ex- 
pressive way for users to specify multi-objective optimization problems. Following 
the principle of keeping metadata (optimization and simulation input data) together, 
we decided to embed the optimization problem specification in the simulation input 
file by prefixing it with special characters, e.g., as annotations prefixed with a special 
character. In some cases it might not be possible to annotate the simulation input 
file. By providing an extra input file parser, optimization problems can be read from 

stand-alone files. 

To allow arbitrary constraints and objective expressions, such as 

name: OBJECTIVE, 

EXPR="5 * average (42 . , "measurement.dat") + ENERGY"; 



we implemented an expression parser using Boost SpiritQ In addition to the parser, 

we need an evaluator able to evaluate an expression, given a parse tree and variable 
assignments to an actual value. Expressions arising in multi-objective optimization 
problems usually evaluate to booleans or floating point values. The parse tree, also 
denoted abstract syntax tree (AST), is constructed recursively while an expression 
is parsed. Upon evaluation, all unknown variables are replaced with values, either 
obtained from simulation results or provided by other subtrees in the AST. In this 
stage, the AST can be evaluated bottom-up and the desired result is returned after 
processing the root of the tree. 

To improve the expressive power of objectives and constraints we introduced a 
simple mechanism to define and call custom functions in expressions. Using simple 
functors, e.g., as the one shown in Listing 4 to compute an average over a set of data 
points, enriches expressions with custom functions. Custom function implementa- 
tions overload the () parenthesis operator. The function arguments specified in the 
corresponding expression are stored in a std: : vector of Boost variant^that can be 
booleans, strings or floating point values. 

Code Listing 4: Simple Average Functor 

struct avg { 

double operator OC 

cl i ent :: funct i on :: argument s_t args) const { 

double limit = boost :: get <double > C args [0] ) ; 
std :: string filename = 



^ http : //boost- spirit ■ com/ 

^ www. boost . org/doc/html/variant .html 
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boost : :get<std; : string >(args [1]) 



double sum = 0.0; 

forCint i = 0; i < limit; i++) 

sum += getDat aFromFi le ( f ilename , i); 

return sum / limit; 

} 

}; 



All custom functions are registered with expression objects. This is necessary 
to ensure that expressions know how they can resolve function calls in their AST. 
As shown in Listing 5 this is done by creating a collection of Boost function^ cor- 
responding to the available custom functions in expressions and passing this to the 
Pilot. 

Code Listing 5: Creating function pointer for registering functor 

f unc t i onD i c t i onary_ t funcs; 
client : ; function : : type f f ; 
f f = average ( ) ; 

funcs . insert Cstd: :pair<std: : string , client : : f unc t ion : : t ype > 
C " my _ aver age_name " , ff)); 



A set of default operators, corresponding to a mapping to C math functions, is 
included in the dictionary by default. This enables an out of source description of 
optimization problem containing only simple math primitives. 

3.6. Parallelization. The parallelization is defined by a mapping of the roles 
introduced above to available cores. Command-line options allow the user to steer 
the number of processors used in worker and optimizer groups. Here, we mainly use 
the command-line options to steer the number of processors running a forward solver. 

The parallelization will be explained and benchmarked in detail in a forthcoming 
publication. One major disadvantage of the master/slave implementation model is 
the fast saturation of the network links surrounding the master node. In |5 authors 
observe an exponential increase in hot-spot latency with increasing number of workers 
that are attached to one master process. Clearly, the limiting factor is the number of 
outgoing links of a node in the network topology and already for a few workers the 
links surrounding a master process are subject to congestions. This effect is amplified 
further by large message sizes. 

To that end we implemented a solution propagation based on rumor networks (see 
e.g. (4|[7|) using only one-sided communication. This limits the number of messages 
sent over the already heavily used links surrounding the master node and at the same 
time helps to prevent the use of global communication. Using information about the 
interconnection network topology and the application communication graph the task 
of assigning roles helps to further improve the parallel performance. 

4. FORWARD SOLVER. The framework contains a wrapper implementing 
the API mentioned in Listing 3 for using OPAL [2] as the forward solver. OPAL pro- 
vides different trackers for cyclotrons and linear accelerators with satisfactory parallel 
performance [3]. Recently we introduced a reduced envelope model ^18j into OPAL 
reducing time to solution by several orders of magnitude. 

With access to the OPAL forward solver the framework is able to tackle a mul- 
titude of optimization problems arising in the domain of particle accelerators. By 
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Fig. 5.1. The hypervolume for a two-objective optimization problem corresponds to the accu- 
mulated area of all dashed rectangles spanned by all points in the Pareto set and arbitrary origin 

Po- 



abiding to the API from Listing 3 it becomes trivial to add new forward solvers to the 
framework (see Appendix A). If the objectives and constraints are simple arithmetical 
expressions, the FunctionEvaluator simulator can be used. Using functors and the 
default expression primitives already powerful multi-objective optimization problems 
can be specified, i.e. the benchmark problem presented in 
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1 — exp —1 { X 



-l<x^<l, 1,2,3. 




Using the default expression primitives this can be stated in an input file as: 



dl 
d2 
d3 



DVAR, VARIABLE="xl", 
DVAR, VARIABLE="x2" , 
DVAR, VARIABLE="x3", 



LQWERB0UND="-1.0" 
LQWERB0UND="-1.0" 
LDWERB0UND="-1.0" 



UPPERB0UND="1.0" 
UPPERB0UND="1.0" 
UPPERB0UND="1.0" 



objl: OBJECTIVE, 

EXPR="1- exp(-l * (sq(xl - l/sqrt(3)) + sq(x2 - l/sqrt(3)) + sq(x3 - 1/sqrt (3) ) ) ) " ; 
obj2: OBJECTIVE, 

EXPR="1- exp(-l * (sq(xl + l/sqrt(3)) + sq(x2 + l/sqrt(3)) + sq(x3 + 1/sqrt (3) )))" ; 

objs: OBJECTIVES = (objl, obj2) ; 
dvars: DVARS = (dl , d2, d3) ; 
constrs: CONSTRAINTS = (); 

opt: OPTIMIZE, QBJECTIVES=objs , DVARS=dvars, CONSTRAINTS=constrs ; 



5. EXPERIMENTS. In this section we present numerical results of the vali- 
dation benchmark and discuss a proof of concept application in the domain of particle 
accelerators. 

5.1. Optimizer Validation. To ensure that the optimizer works correctly we 
solved the benchmark problem ( |4.1[ ). To that end, we use a metric for comparing 
the quality of a Pareto front. Given a point in the Pareto set, we compute the m 
dimensional volume (for m objectives) of the dominated space, relative a chosen origin. 
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We visualize this for 2 objectives in Figure |5.1| For further information and details 



of the implementation see 27 . Figure 5.2 and the corresponding hypervolume values 
in Table 5.1 The reference Pareto front is clearly very well approximated. It took 
a total of 1100 function evaluations to perform this computation. The hypervolume 
of the reference solution (0.6575) for our benchmark was computed by sampling the 
solution provided in [17| . 

Table 5.1 

Convergence of benchmark problem with errors relative to hypervolume of sampled reference 
solution. 



tot. fimction 
evaluations 



hyper volume relative error 



100 


0.859753 


3.076 X 10- 


-1 


200 


0.784943 


1.938 X 10" 


-1 


500 


0.685183 


4.210 X 10' 


-2 


900 


0.661898 


6.689 X 10" 


-3 


1100 


0.657615 


1.749 X 10" 


-4 



From Table 5.1 we deduce that we achieved satisfactory convergence to the sam- 
pled reference Pareto front after 1000 (plus the additional 100 evaluations for the 
initial population) function evaluations. 

5.2. Ferrario Matching Point. As a verification and proof of concept we re- 
produce the Ferrario matching point discovered by Ferrario et al. [15| , by formulating 
the problem as a multi-objective optimization problem. Using the low- dimensional 
and fast nature of their new simulation code Homdyn fl4' , an extensive beam dynam- 
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rmSxj peak 

Fig. 5.3. Illustration of the Ferrario matching criteria: beam emittance attains a maximum 
and rms beamsize a minimum at the entrance to the first accelerating traveling wave structure. 



15 was the discovery of a novel 



ics study was conducted. 

One of the results of the study presented in 
working point. The authors noticed that the second emittance minimum can profit 
from the additional emittance compensation in the accelerating traveling wave struc- 
ture ensuring that the second emittance minimum occurs at a higher energy. This 
property is attained if the beam emittance has a maximum and the root mean square 
(rms) beam size has a minimum at the entrance of the first accelerating traveling wave 
structure. This behavior is illustrated in Figure [573] 

By artificially reproducing this working point as the solution of a multi-objective 
optimization problem given in equations ( 5.1 1 to ( 5.9 1, we demonstrate the automation 
of discovering optimal beam dynamics behaviors given a set of desired objectives. 



s.t. 



[ArmSa^^pcak = |3.025 - rms^^^pcakl, 


(5.1) 


A^a^.pcak — |3.025 £^2;,pcak|; 


(5.2) 


|rm.Sa;.pcak_pos £^2;,pcak_pos |] 


(5.3) 


q = 200 [pC] 


(5.4) 


VoltRF = 100 [MV/m] 


(5.5) 




(5.6) 


KSl < KSrf < KSu 


(5.7) 


LAGl < LAGrf < LAGu 


(5.8) 


Azlks < Azks < Az^/Ks 


(5.9) 



The first two objectives minimize the distance from the position of the current 
minimum peak to the expected peak location at 3.025 m for transverse bunch size 



(beam waist) and emittance (see Figure 5.3). The third objective (5.3) adds a condi- 
tion preferring solutions that have their emittance and rms peak locations at the same 



z-coordinate. Equations (5.4) and (5.5) define constraints for initial conditions for the 



simulation: charge, gun voltage and laser spot size. Design variables given in (5.6) to 



(5.9) correspond to field strengths of the first focusing magnet, its displacement, and 
the phase of the gun. 

In order to compute the peaks, we employed an additional Python script. This 
script was called in the OPAL input file, after the simulation finished using the SYSTEM 



functionality. Once the peaks (in a given range) were located, the two objectives (5.1 ) 
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Table 5.2 

Initial conditions for the envelope tracker. 



name 


initial value 


Gun voltage 


100 MV 


Bunch charge 


200 pC 


-DTBcamlinc 


1.5 ps 


Number of shoes 


400 



and ( 5.2 ) were computed and their values written into corresponding files. The custom 
f romFile functor allows us to access the values stored in the peak finder Python script 
result files 

rmsx: OBJECTIVE, EXPR="f romFile ("rms_x-err.dat " , "var")"; 
emitx: OBJECTIVE, EXPR="fromFiIe("emit_x-err .dat" , "var")"; 
match: OBJECTIVE, EXPR="fabs(fromFile("emlt_x-peak.dat" , "var") - 

fromFile("rms_x-peak.dat" , "var")) " ; 

The design variables and the assembly of the multi-objective optimization problem 
can be included in the OPAL input file as shown below: 



dl 
d2 
d3 
d4 



DVAR, VARIABLE="SIGX", LOWERBOUND="0 . 00025" , UPPERBOUND="0 . 00029" ; 

DVAR, VARIABLE="FINDl_MS0L10_i" , LOWERBOUND="110" , UPPERBOUND=" 120" 

DVAR, VARIABLE="D_LAG_RGUN" , LOWERBOUND="-0. 1" , UPPERBOUND="0. 1" ; 

DVAR, VARIABLE="D_SOLPOS" , LOWERBDUND="-0 . 05" , UPPERBDUND="0 . 05" ; 



objs: OBJECTIVES = (rmsx, emitx); 
dvars : DVARS = (dl , d2 , d3 , d4) ; 
constrs: CONSTRAINTS = () ; 

opt: OPTIMIZE, DBJECTIVES=objs, DVARS=dvars , 

CONSTRAINTS=constrs ; 

All numerical experiments in this sections were executed on the Felsim cluster 
at PSI. The Felsim cluster consists of 8 dual quad-core Intel Xeon processors at 
3.0 GHz and has 2 GB memory per core with a total of 128 cores. The nodes are 
connected via Infiniband network with a total bandwidth of 16 GB/s. 

5.3. Convergence Study. The envelope tracker, mentioned in the previous 
chapter, was chosen as the forward solver. We performed a beam convergence study 
in order to tune the simulation input parameters to achieve the best trade-off between 
simulation accuracy and time to solution. These parameters include the number of 
slices (NSLICE) used for the envelope-tracker simulations, simulation timestep (DT) 
and gun timestep (DTGUN). 

Before the simulation can be executed a number of initial beam optics parameters 



have to be defined in an input file. Table 5.2 shows the values of these parameters for 



the envelope-tracker. All simulations were performed up to 12.5 m of the SwiSSFEL 



250 MeV injector 24 beam line, with energies reaching up to 120 MeV. 

The parameter that affects the performance most is the number of slices. We 
scanned the range from 100 to 1000 slices to determine the minimal number of slices 
required for stable results using various timesteps. The results (for 100, 400 and 800 



slices) of this scan are shown in Figure 5.4 Using this data we settled for 400 slices 



- increasing the slice number only minimally improves convergence of the results, 
therefore using more slices is inefficient. 

In a next step the influence of different time steps was examined. To that end 
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Fig. 5.4. Envelope-tracker with different number of slices and simulation time steps. 

Table 5.3 
The design variables for individual 3. 



name 



value 



ax 0.262 mm 

Solenoid displacement 28.8467 mm 

Gun voltage lag 0.0159067 MV 

Solenoid current 111.426 A 



a series of optimization runs with 100, 400 and 800 slices and varying timestep was 
performed. Figure [53] shows the Pareto fronts for 400 slices respectively using different 
timesteps. As expected, increasing the number of slices while lowering the timestep 
produces more detailed results. 

5.4. Optimization Results. Each of the 40 points on the Pareto front, shown 
in Figure |5.5[ represents an optimal solution, where emittance and beamsize values 
are compromised to achieve the best agreement with the Ferrario matching point. We 
selected individual 3 based on a comparison of the emittance and beamsize characteris- 
tics of all solutions and by retaining the feasibility of the beam line optics parameters. 
The design variables, emittance and beamsize of the selected solution are shown in 



Table 5.3 and Figure 5.6 With the multi-objective optimization framework we attain 



the same working point as reported in 24 



Using the input parameters of the selected solution, we performed a stability 
analysis by varying the slice number and the time step for both the gun and the beam 
line. Figure [5^ shows that the exit emittance stabilizes for 400 slices and various time 
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Fig. 5.5. Pareto front for the lOOOi/i generation with 40 individuals using 4OO slices (interest- 
ing region magnified), a simulation timestep of 1.5 ps. The individual 3 was selected for further 
investigations. 




Fig. 5.6. Beamsize and emittance of individual 3 . 



steps. No difference between 800 and 400 slices is visible as their minimum maximum 
extension seems to be in the same range of 0.024 mm mrad. 

For validation purposes we compared the results of the envelope-tracker using 
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Fig. 5.7. Comparison 3D-tracker versus envelope-tracker in case for rms^ and ex- 



the analytical space charge model with the OPAL 3D macro particle tracker. The 
benchmark was run on the first 12.5 meters of the SwiSSFEL 250 MeV injector. The 
results for both rms beamsize and emittance are shown in Figure [STzl A good agree- 
ment between the two codes can be observed. The difference of the larger emittance 
along the solenoids in case of 3D tracker that is not seen by the envelope-tracker is 
due to the different definition of the particle momenta (canonical vs. mechanical). 
Both trackers agree within acceptable limits [sj. 

6. CONCLUSIONS. We presented a general-purpose framework for solving 
multi-objective optimization problems. Its modular design simplifies the application 
to simulation-based optimization problems for a wide range of problems and allows to 
exchange the optimization algorithm. In a recent work, we successfully implemented 
and integrated another optimization algorithm, originally presented in 25 , into our 
framework. The flexibility of being able to adapting both ends of the optimization 
process, the forward solver and the optimization algorithm simultaneously not only 
leads to broad applicability but it facilitates the ability to tailor the optimization 
strategy to the optimization problem as well. 

In this paper we applied the framework to reproduce the important and well 
known Ferrario matching condition for the 250 MeV injector of PSI's SwiSSFEL. 
The beam size and emittance optimization was successful and emittance damping 
is observed in the accelerating cavity. Also, the influence of the number of slices 
employed by the envelope tracker with respect to convergence was investigated. We 
found that the optimal slice number is around 400 for the problem addressed here, 
producing the best trade-off between computational cost and accuracy of the solution 
of the simulation. Similarly, the gun and beam line time step was investigated con- 
firming the convergence of the results. This first study shows that the framework is 
ready to tackle problems arising in the domain of beam dynamics. 

Interestingly, the computation of a good approximation of the Pareto front is 
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already feasible on a small cluster using only a modest number of MPI threads. For 
1000 generations consisting of 2048 feasible and 8354 infeasible function evaluations 
(simulations) the optimizer needed 845 minutes on the Felsim cluster on 16 threads. 
When we double the number of threads to 32 (Felsim allows a maximum of 64 threads 
per job) the runtime improves to 302 minutes (2048 accepted and 5426 infeasible 
individuals). Notice that the runtime can vary due to the random nature of the 
process. Nevertheless we see a reasonable reduction when using more cores to solve 
the optimization problem and the parallel performance will be further evaluated in a 
forthcoming publication. 

In contrast to approaches that are tightly coupled to the optimization algorithm, 
the range of possible applications is much wider. Even in cases where the mathemat- 
ical model of the forward solver is not known exactly, fixed or on the real time mea- 
surements can be used to guide the search for the Pareto optimal solutions. Finally, 
combining the presented multi-objective optimization framework with a physicist long 
standing experience in the field provides a solid basis for better understanding and 
improving the decision making process in the design and operation of particle accel- 
erators. 

7. ACKNOWLEDGMENT. The authors thank the SwiSSFEL team for con- 
tributing to the formulation of optimization problems. 

Appendix A. Forward Solver Implementation. 

A simple implementation, e.g. using a Python forward solver, is given below in 
Listing 6. We assume the Python script executes a simulation and dumps results in 
the SDDS file format ^ . Utility classed are used to parse the result data and fill the 
appropriate structures returned to the optimizer algorithm. 

Code Listing 6: Simple Simulation Wrapper 

#include <map> 
#iiiclude <string> 
#include <vector> 

#lnclude " Ut il / Types . h " 
#include " Ut il / CmdArgument s . h " 
#include " Simulation/Simulation. h" 

#include " boo st / smart _ptr . hpp " 

class S impleS imulat ion : public Simulation { 

public ; 

S impleS imulat ion (Expressions : :Named_t 
Expressions ; : Named_t 
Param_t params , std : 
CmdArguments_t args ) 

" SimpleSimulation () ; 

/// Calls simulation through Python wrapper and returns when simulation 
/// has either failed or finished, 
void run () ■( 

// user implements wrappers 

prepare. input _f il e C) ; 

run_python_simulation C) ; 

} 

/// Parse SDDS stat file and build up requested variable dictionary, 
void colle ctResult s () -[ 

Expressions : : Named_t : : iterator it ; 

forCit = ob j e ct ive s _. begin C ) ; it != ob j e c t ives_ . end () ; it++) { 



objectives , 
constraints , 
: string name, MPl_Comm comm , 
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Expressions::Expr_t *objective = it->second; 

// find out which variables we need in order to evaluate the 
// ob j ective 

variableDictionary_t variable_dictionary ; 

std::set<std;:string> req_vars = objective->getReqVarsC); 
ifCreq_vars.sizeC) != 0) { 

boost : ; scoped_ptr <SDDSReader> sddsr (new SDDSReader (fn) ) ; 

try { 

sddsr ->parseFile C) ; 
> catchCOptPilotException &e) { 

std : ; cout << "ExceptionuWhileuparsinguSDDSufile : u" 
<< e . what () << std::endl; 

break ; 

> 

// get all the required variable values from the stat file 
foreach(std: : string req_var , req_vars) { 

if C variable_dictionary . count (req_var) == 0) { 

try { 

double value = 0.0; 

sddsr->getValue(l /*atTirae+/, req_var, value); 
variable_dictionary . insert C 

std: :pair<std: :string, double>Creq_var , value)); 
} catchCOptPilotException &e) -[ 

std:: cout << "Except i on □ whi le ugettinguvaluou" 
<< "fromuSDDSuf ile : u" << e.whatO 
<< std : : endl ; 

} 



// and evaluate the expression using the built dictionary of 

// variable values 

Expressions ::Result_t result = 

objective->evaluate Cvariable_dictionary) ; 

std : : vector<double > values ; 

values .push_back (boost : :get<0>(result)) ; 

bool is_valid = boost ::get<l>(result); 

reqVarInfo_t tmps = {EVALUATE, values, is_valid}; 
requestedVars_ . insert C 

std : :pair<std : : string , reqVarInfo_t >(it->first , tmps)) ; 



/// returns container containing all requested variables with results 
reqVarContainer_t getResultsC) { return requestedVars_; } 



private : 



/// holds solutions returned to the optimizer 
reqVarContainer_t request edVars_ ; 

Expressions : : Named_t objectives. ; 
Expressions : : Named_t constraints. ; 



MPI_Comm comm_ 



>; 

#endif 
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